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Random versus Encrypted Data 

Introduction 

What does it mean for information to be leaked? What is information anyways? If we come across 
some data, can we tell if it contains information or not? Is it even meaningful to talk about data which 
carries zero information? These are important questions when it comes to encrypting file systems, 
because, as I will show below, there are small amounts of information leaked even in our best 
encrypted system. 

If we start by defining information as data containing a message then the opposite of information, or 
zero information, is random data, ie, data which may conform to some spectral parameters but does not 
exhibit any correlation between data elements. In order for data to carry a message, elements in the 
sequence must be correlated in some way. For example, V followed by "a" followed by "t" conveys 
the message "cat" — the order is important as is the proximity of the characters, and within the canon 
of the English language, this message elicits the image of a fuzzy creature which purrs. This 
correlation, extended over the data sequence, gives the message. Since "random" by definition means 
that there are no correlation, there can be no message. The ideal then in encryption would be to create a 
system that cannot be distinguished by any means and under any circumstances from random data. I'm 
not sure this is even possible, but I can show ways in which we fall short of the ideal. 

To illustrate these points, consider the following two strings of random hex digits: 

bl28755656bfbl0f8bedlc06614afd55 

02bf25c2cd984db308 1 80844 1 fede800 

Which sting contains random data and which contains encrypted data? This is a trick question because 
the truth is, neither does. The first string was generated using 

echo "Hi Mom" | aespipe | xxd -ps 

with password "asdijklqweruiopzxcvnm" while the second was generated using 
dd if=/dev/urandom count=16 bs=l | xxd -ps 

which is a sequence calculated by the kernel's non-blocking pseudo-random number generator (PRNG). 
PRNG's are similar to encryption algorithms in that they use mathematical formulas to generate 
subsequent elements of the sequence. They conform to certain spectral requirements (eg heads or tails 
are equally probably) and endeavor to disguise correlation, and so they act as decent approximations to 
random number sequences. A better way to generate random numbers is to use random physical events, 
like rolling dice or weather fluctuations. The kernel provides /dev/random that gathers it entropy (ie 
randomness) from the hardware; however, since this requires system activity to accumulate before it 
can deliver those random numbers, it blocks. Try using dd if=/dev/random instead of /dev/urandom in 
the examples below and you'll see how annoying the wait can be! I'll use urandom throughout, but to 
make these true tests, we should use random. 



A few more points to note before moving one: 



1) It should not matter whether the attacker knows what cypher was used in encrypting the message — 
in the above case I used 128-bit AES. The message is safe unless he also knows the secret password. A 
cracked cipher is one in which the attacker can obtain the clear message without the secret in a 
reasonable about of time. Mathematically, 

decryption = F( encrypted message, cipher, secret ) 

runs in a reasonable amount of cpu time and gives garbage unless the right secret is supplied. In our 
example, 

echo -n "M28755656bfbl0f8bedlc06614afd55" | xxd -r -ps | aespipe -d 

returns "Hi Mom" only when we supply the right secret. It uses only fractions of a second of cpu time 
and prepending "time" to the above command gives: 

real 0m7.041s 
user OmO.OOOs 
sys 0m0.028s 

In contrast, a function like this 

decryption = G( encrypted message, cipher ) 

which runs in a reasonable amount of cpu time does not exist for a good cipher. If it did and were 
found, then we would say that the cipher is cracked. Since the key size is 128-bits, this means there are 
2 A 128 = 3.4x1 0 A 3 8 possible keys to try and one and only one will decrypt the message. Since each 
attempt takes about 0.03s this means a brute force attack would take about 10 A 37s = about 10 A 19 
lifetimes of the known universe. To cover this key space, aespipe insists on a password of 20 chars long 
or more. Including upper/lower/numbers/special chars, this means 94 possibilities for each char and 
94 A 20 = 2.9x1 0 A 39. Its up to the user now to choose randomly within this password spaces — good 
luck! Since humans prefer easy to remember passwords to hard, the actual distribution of chosen 
passwords is not uniform and so brute force attacks, like John the Ripper", proceed by trying more 
probable passwords before less probably ones, leading to a dictionary attack. 

2) Since knowing the cipher should not give the attacker any advantage, some implementations, like 
the luks extension to dm-crypt announce it in a header. Here's an example obtained using the following 
commands: 

dd if=/dev/urandom of=dmcrypt-luks-A.bmp count=l bs=lM 
losetup /dev/loopO dmcrypt-luks-A.bmp 
cryptsetup luksFormat /dev/loopO 
hexdump -C /dev/loopO | more 
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The hexdump is pretty clear, but to make it even easier cryptsetup provides a utility which will present 
this information in a human readable form. The following 

cryptsetup luksDump /dev/loopO 

yields 

LUKS header information for /dev/loopO 

Version: 1 

Cipher name: aes 

Cipher mode: cbc-essiv:sha256 

Hash spec: shal 

Payload offset: 1032 

MK bits: 128 

MK digest: ca 4e ae 89 ce dO 83 9e 6b aa 2d 74 8d ae 9d 56 2e fl 63 81 
MK salt: 69 18 a9 99 b5 cf 33 47 Oe 03 G bd aO b8 15 65 
85 2e 6e f3 22 al 7a 2e 2d a4 f6 50 73 69 07 ff 
MK iterations: 10 

UUID: 9b7bb3d2-7blc-4fc9-9236-96aadl6db65e 

Key Slot 0: ENABLED 
Iterations: 268337 

Salt: ee cO 18 69 ed 2b d9 5d 6e 55 bO 53 66 8d al 13 

e9 75 a9 80 fd Oe bb 38 6c c5 4b 40 b9 c7 ee 2d 

Key material offset: 8 

AF stripes: 4000 

Key Slot 1: DISABLED 

Key Slot 2: DISABLED 

Key Slot 3: DISABLED 

Key Slot 4: DISABLED 

Key Slot 5: DISABLED 



Key Slot 6: DISABLED 
Key Slot 7: DISABLED 



While none of this information helps the attacker to decode the encrypted data which follows the 
header, it does give away some information. It says "what follows is encrypted data, it is encoded using 
128-bit AES, the mode in which this encryption is used is cipher block-chaining with initialization 
vector calculated using sha256 ..." This is harmless enough except that it diminishes plausible 
deniability meaning that if the attacker is able to coerce the victim into giving over the key, the victim 
has a more difficult time denying that there is any key to be given over and what the attacker is looking 
at is just random data. Some encryption implementations, like truecrypt, approach plausible deniability 
by creating a "hidden" volume. In this approach, there are two passwords, one which unlocks the true 
hidden volume and another which unlocks a fake volume. When coerced, the victim can give the key to 
unlock the fake volume while remaining silent about the true volume. 

If we take the ideal of "zero information loss" then any luks-type header which discloses the metadata 
of the encrypted data compromises plausible deniability. In this case, implementations like dm-crypt 
without luks extensions or loop-aes are preferred. However, in cases where plausible deniability is not 
an issue, then a luks-type header has advantages, eg. the hal/dbus system in Gnome recognizes the luks 
header and initiates a dialog where the user can enter the password and obtain the decrypted volume as 
an icon on the desktop when he plugs in an encrypted pen drive. If all the potential victim is trying to 
protect against is theft, then deniability is not an issue. But if it comes to a repressive regime trying to 
commit human right abuses, then plausible deniability and zero information take on a new dimension. 



Cryptanalysis with bitmaps 

Let's return to the original question armed with the above knowledge. Is it possible to recognize the 
difference between (pseudo) random data and encrypted data even in the absence of any metadata? I'll 
demonstrate below that in some circumstances the answer is yes. To make my points as obvious as 
possible, 111 use bitmap images to demonstrate what's going on. Let's start with our reference systems 
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Figl 



These images were created using The Gimp as 200x200 24-bit bmp images. (I had to render them as 
jpg to put on the web page, but that was the very last step. The original bitmaps can be obtained here .) 



The B.bmp file represents pure white, and its body is made up of all hex values FF. A.bmp is also 
mostly FF's except for the black A which are 00 f s. There are also some transitional values at the boarder 
between the black A and and white background. The image A-B.bmp was created by copying all of 
B.bmp to the clipboard, and pasting it into A as a new layer. The two layers were then "differenced" 
using the layer panel (you can bring it up using Windows => Dockable Dialogs => Layers). 

Here are our reference random data images: 




Fig 2 



These images were create using the following: 

dd if=/dev/urandom of=random.bmp count= 120054 bs=l 
dd if= A.bmp of=random.bmp count=54 bs=l conv=notrunc 

The first line creates a random file the size of A.bmp and the second line adds the 54 byte bmp header. 
It is identical for all our bitmaps of the same size so we can just lift it from A.bmp. I added this header 
for all of the encrypted or random files below as a final step to render the file as a viewable bitmap. 

Til use this poor man's cryptanalysis to look at some implementations of encryption. I'll start with poor 
implementations and work towards our best practices. Using bitmaps is not rigorous, but its a nice 
technique to illustrate what we're looking for. At the time of this writing, I'm coding up a suite of test 
based off of Knuth's work on random number. Stay tuned! 



Why embed encrypted data within random data and why using cipher block chaining 

Almost all software which installs encrypted systems will lay down random data on the device first and 
default to some strong cipher (eg 128-bit aes) implemented with cipher block chaining. Let's use our 
bitmap cryptanalysis to see what happens if you don't. 

First let's put A.bmp within an unencrypted ext2 file system for reference — where do the blocks of 
A.bmp get put inside a filesystem? I produced this image as follows: 



dd if=/dev/zero of=ext2-A.bmp count=480054 bs=l 

losetup /dev/loopO ext2-A.bmp 

mke2fs /dev/loopO 

mount /dev/loopO zzz/ 

cp A.bmp zzz/ 

umount zzz/ 

losetup -d /dev/loopO 

dd if=A400.bmp of=ext2-A.bmp count=54 bs=l conv=notrunc 

Note thatA400.bmp is a 400x400 24-bit bmp which is 480054 bytes in size. The larger size 
accommodates a larger file for the ext2 filesystem so that one can copy A.bmp into it without running 
out of device space. It is also commensurate with the 200x200 bmp so that the A doesn't get wrapped 
unrecognizeably when the file system is rendered as a bmp image. 

Let's similarly put A.bmp into a poorly encrypted ext2 file system. Here we make two mistakes: 1) we 
do not first fill up the file system with (pseudo) random data before encrypting and formatting, and 
secondly we don't use any cipher block chaining. This image was produced as follows 

dd if=/dev/zero of=ext2-encl -A.bmp count=480054 bs=l 

losetup /dev/loopO ext2-encl-A.bmp 

cryptsetup -c aes-ecb create ext2-encl-A.bmp /dev/loopO 

mke2fs /dev/mapper/ext2-enc 1 -A.bmp 

mount /dev/mapper/ext2-encl -A.bmp zzz/ 

cp A.bmp zzz/ 

umount zzz/ 

dmsetup remove ext2-encl-A.bmp 
losetup -d /dev/loopO 

dd if= A400.bmp of=ext2-encl -A.bmp count=54 bs=l conv=notrunc 

Since dd pulls its data out of /dev/zero, the file system stats with a base of all zeros below the 
encryption layer. Also, cryptsetup -c aes-ecb sets up a dm-crypt layer without any chaining, ie, each 
block is encrypted independently of any other block, so a clear block of all 0's is always encrypted into 
the same encrypted block. The pattern of the file within the file system clearly emerges. 

Finally, let's repeat the above, but this time we'll correct one of our mistakes. We'll lay down random 
data under the encryption layer, but we'll still use aes-ecb. 
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Fig 3 



Its clear what's happening here. The since each block is encrypted independently of others, the 
underlying structure comes through. We are no where near the ideal of zero information. 



Dm-crypfs default aes-cbc-essiv:sha256 with random filler 

Laying down a random background and chaining is clearly important. There are several techniques of 
chaining, but the underlying idea is similar. In CBC mode, the most popular form of chaining, a block 
is XORed with the previous encrypted block before it itself is encrypted, thus creating a "chain". An 
initialization vector is used for the very first block. Since blocks are not encrypted independently of 
other blocks, a block of say all 0 f s will not always be encrypted in the same way, and the structure of 
the underlying clear file system doesn't emerge as obviously as in ecb. Let's repeat the previous 
experiment, but this time, we'll use aes-cbc-essiv:sha256. We'll still use all zeros and random data for 
the unencrypted background for comparison. Here's what one gets: 
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Fig 4 



The importance of a random background is apparent in this example. The chaining certainly obscure 
the structure of the underlying file system, but the outline of large regions of empty space are still 
discernible. When these empty regions are filled with random data, it becomes nearly impossible to tell 
what's encrypted and what is random. In fact, the 128-bit aes-cbc-essiv:sha256 passes all the standard 
tests for random number on both a local and global scale — this is the subject of another writeup that 
Til post another time. 



Information leak despite random filler and chaining 

At this point the reader may think he's safe and arrived at "zero information loss", but unfortunately, 
there is still another kind of attack that can be launched. This one depends on the attacker being able to 
watch the encrypted file system at snapshots in time. This might happen, for instance, if the victim 
backs up his data to a hard drive which he then stores off site. If the attacker sneaks between backups 
and images the disks, it then become possible for him to launch this king of attacks. 

Here are three demonstrations of this attack: 

a) aespipe is able to operate in several modes. One mode is a 128-bit aes-cbc with a simple 
initialization vector and one password. Another is multi-key- v3 mode, which also employs 128-bit aes- 
cbc, but uses 64 different keys to encrypt the blocks — the first key for the first block, the second for 
the second and so on in a cyclical fashion. It also uses a 65th key plus MD5 for the initialization vector. 
Still, if an attacker has access to a file system before and after the addition of data, evidence emerges of 
the underlying encryption. 

The following bitmaps were generated to illustrate the single key simple initialization vector mode: 

aespipe < A.bmp > aesA.bmp 
aespipe < B.bmp > aesB.bmp 

dd if= A.bmp of=aes A.bmp count=54 bs=l conv=notrunc 
dd if= B.bmp of=aesB.bmp count=54 bs=l conv=notrunc 

The Gimp was then used as described above to produce aesA-aesB.bmp. Here are the results: 




One can compare these to bitmaps for multi-key- v3. First we generate a file containing the 65 keys: 

head -c 2925 /dev/urandom | uuencode -m - | head -n 66 | tail -n 65 | gpg —symmetric -a > 
key.gpg 

And then we use the key file to produce the encrypted bitmaps: 

aespipe -K key.gpg < A.bmp > aesA-v3.bmp 

aespipe -K key.gpg < B.bmp > aesB-v3.bmp 

dd if= A.bmp of=aes A- v3.bmp count=54 bs=l conv=notrunc 

dd if= A.bmp of=aesB- v3.bmp count=54 bs=l conv=notrunc 



Here are the results: 




Fig 6 



Its clear that the change to the file system is better blurred but there is still some information to be 
gained about the underlying unencrypted data. 

b) Let's try the same test using 128-bit aes-cbc-essiv:sha256 on a random background. Here well 
employ luks extensions for convenience, but this is not necessary, nor does it change our conclusions: 

dd if=/dev/urandom of=ext2-enc5- A.bmp count= 1080054 bs=l 

losetup /dev/loopO ext2-enc5-A.bmp 

cryptsetup luksFormat /dev/loopO 

cryptsetup luksOpen /dev/loopO ext2-enc5-A.bmp 

mke2fs /dev/mapper/ext2-enc5-A.bmp 

mount /dev/mapper/ext2-enc5-A.bmp zzz/ 

cp A.bmp zzz/ 

umount zzz/ 

dmsetup remove ext2-enc5-A.bmp 
losetup -d /dev/loopO 



cp ext2-enc5-A.bmp ext2-enc5-B.bmp 

losetup /dev/loopO ext2-enc5-B.bmp 

cryptsetup luksOpen /dev/loopO ext2-enc5-B.bmp 

mount /dev/mapper/ext2-enc5-B.bmp zzz/ 

cp B.bmp zzz/A.bmp 

umount zzz/ 

dmsetup remove ext2-enc5-B.bmp 
losetup -d /dev/loopO 

dd if=A600.bmp of=ext2-enc5-A.bmp count=54 bs=l conv=notrunc 
dd if=A600.bmp of=ext2-enc5 -B.bmp count=54 bs=l conv=notrunc 



Here are the results: 
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Fig 7 



c) To fairly compare loop-aes f s multikey-v3 with dmcrypt aes-cbc-essiv:sha256, we repeat the previous 
test with loop-aes. 

dd if=/dev/urandom of=ext2-enc6- A.bmp count= 1080054 bs=l 

losetup -e AES128 -K key.gpg /dev/loopO ext2-enc6-A.bmp 

mke2fs /dev/loopO 

mount /dev/loopO zzz/ 

cp A.bmp zzz/ 

umount zzz/ 

losetup -d /dev/loopO 

cp ext2-enc6-A.bmp ext2-enc6-B.bmp 

losetup -e AES128 -K key.gpg /dev/loopO ext2-enc6-B.bmp 

mount /dev/loopO zzz/ 

cp B.bmp zzz/A.bmp 

umount zzz/ 

losetup -d /dev/loopO 



dd if=A600.bmp of=ext2-enc6-A.bmp count=54 bs=l conv=notrunc 
dd if=A600.bmp of=ext2-enc6-B.bmp count=54 bs=l conv=notrunc 



Here are the results: 







■-. ■ • 


- . . 
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Fig 8 



If we zoom in on the bands, we can see that multikey-v3 loop-aes does a better job than aes-cbc- 
essiv:sha256 dmcrypt and hiding the underlying data: 




Fig 9 



Best Practice 

Now that we know more about how information can leak from an encrypted file system, we can 
conclude with some advice for best practices. In addition to chosing multikey-v3 loop-aes, to avoid 
exposing the underlying structure of the clear file system, one can try filling up the empty space with 
(pseudo) random data. To illustrate, we repeated the previous example, but after copying A.bmp into 
the filesystem, we filled up the remaining empty space as follows: 



cd zzz 

dd if=/dev/urandom of=waste 



When we remount the system, we first clear the space by removing "waste", do our work and then fill it 
back up again with (pseudo) random data: 

cd zzz 
rm waste 

cp ../B.bmp A.bmp 

dd if=/dev/urandom of=waste 




Some of the underlying structure is still visible, but it is clear that we're getting closer to the situation in 
Fig 2. 

Finally, let's go one more step. How about scattering A.bmp through the blocks of the file system, in 
other words, fragmenting it? Usually one wants to avoid fragmentation because it adds latency to 10 
during hard drive seeks. This is not an issue with solid state devices, such as pen drives or SSD's and so 
we should consider what benefit is gained in terms of encryption. First, let's see what happens to A.bmp 
when it is fragmented on an unencrypted drive. We do this by creating lots of little files and deleting 
every tenth one (and it bit more) before copying in A.bmp: 

dd if=/dev/urandom of=ext2-frag.bmp count= 1080054 bs=l 
losetup /dev/loopO ext2-frag.bmp 
mke2fs -N 1 024 /dev/loopO 
mount /dev/loopO zzz 
cd zzz 

for i in $(seq 1 1024) ; do dd if=/dev/zero of=$i. waste count=l bs=lk ; done 

rm -f *7.waste 

rm -f ?7?.waste 

cp ../A.bmp . 

cd.. 



umount zzz 

dd if=A600.bmp of=ext2-frag.bmp count=54 bs=l conv=notrunc 



Here is the result: 




Fig 11 



So let's modify our technique of filling the empty space with random data by further forcing the 
fragmenation of A.bmp and B.bmp. 

dd if=/dev/urandom of=aes-ext2-frag-A.bmp count=l 080054 bs=l 
losetup -e AES128 -K key.gpg /dev/loopO aes-ext2-frag-A.bmp 
mke2fs -N 1 024 /dev/loopO 
mount /dev/loopO zzz/ 
cd zzz/ 

for i in $(seq 1 1024) ; do dd if=/dev/urandom of=$i. waste count=l bs=lk ; done 
rm -f *4.waste *7.waste 
cp ../A.bmp . 

for i in $(seq 1025 2048) ; do dd if=/dev/urandom of=$i. waste count=l bs=lk ; done 
cd.. 

umount zzz/ 
losetup -d /dev/loopO 

cp aes-ext2-frag-A.bmp aes-ext2-frag-B.bmp 

losetup -e AES128 -K key.gpg /dev/loopO aes-ext2-frag-B.bmp 

mount /dev/loopO zzz/ 

cd zzz/ 

cp ../B.bmp A.bmp 
rm -f *. waste 
cd.. 

umount zzz/ 
losetup -d /dev/loopO 



Using our subtraction attack, here's what we find: 




The black bands (regions of no change) are still visible as in Fig 10, but are more scattered reflecting 
the fragmentation of the underlying file. It is unlikely we can totally eliminate these regions of 
unchanged data because they probably represent the file system's metadata, ie, the i-nodes and 
superblocks. 



